Exploratory Data Analysis
Before jumping into looking at the data, a few simple pre processing steps are necessary. Some of the numeric features in the data, like quality scores, are actually categorical in nature. We’ll transform those features into factors and we’ll also identify and separate discrete features from continuous features so they can be examined separately. We’ll also combine the data so that we can make easily make comparisons between the test and the train set to check that feature distributions are approximately identical.
# Combine data sets with distinction between train and test
home_train[,split := "train"]
home_test[,split := "test"]
home_test[,SalePrice := NA]
home_data <- rbind(home_train, home_test)
# Convert features into proper format based on data_description.txt
home_data[,c("MSSubClass",
"OverallQual",
"OverallCond") := list(
as.factor(MSSubClass),
as.factor(OverallQual),
as.factor(OverallCond)
)]
# Number of discrete and continuous features
sapply(home_data, class) %>%
table
.
character factor integer
44 3 35
# Define dependent variable
y <- "SalePrice"
# Setnames to be more program friendly
setnames(home_data, names(home_data)[str_detect(names(home_data), "^[0-9]")], paste0("h", names(home_data)[str_detect(names(home_data), "^[0-9]")]))
# Extract names of continuous features
cont_features <- names(home_data)[sapply(home_data, class) == "integer"]
cont_features <- setdiff(cont_features, c(y, "Id"))
# Extract names of discrete features
disc_features <- names(home_data)[sapply(home_data, class) %in% c("character", "factor")]
disc_features <- setdiff(disc_features, "split")
Response Variable
Before jumping into examining the features, it is important to understand the response variable, in this case SalePrice. The following histogram illustrates the distribution of SalePrice in the data.

This histogram clearly indicates the data is somewhat right skewed. In an attempt to control for that, we’ll look at what the distribution looks like after a log transformation and compare it with the original distribution in the following plot.
home_data[,log_SalePrice := log(SalePrice)]
# Histogram of log sale price
lsp_hist <- home_data %>%
ggplot(aes(x = log_SalePrice)) +
geom_histogram() +
theme_minimal() +
labs(title = "Histogram of log(SalePrice)")
# Side by side comparison of origional distribution with log distribution
plot_grid(sp_hist, lsp_hist)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Removed 1459 rows containing non-finite values (stat_bin).`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Removed 1459 rows containing non-finite values (stat_bin).

The log transformation appeared to help with the skewness of the data so the remainder of the analysis will use log_SalePrice.
Missingness
Now, we want to check for missing values. The VIM package has some nice functions for visualizing missing values and so we’ll use those. Essentially, we want to look for features with a high volumn of missing values and we also want to see if there are prevelant patterns in features that are often missing together. That could indicate a latent variable that isn’t being measured but that we could account for with a dummy variable.

There are quite a few columns dominated by missing values. In addition, there are a couple of combinations of missing values in observations that happen rather frequently, possibly an indication of some latent variable. In order to help machine learning models pick up on these similar observations, we’ll add a missingness feature to the data to encode common missingness combinations.
# Add missingness feature
home_data[,missingness := "other"]
home_data[is.na(Alley) & is.na(PoolQC) & is.na(Fence) & is.na(MiscVal), missingness := "one"]
home_data[is.na(Alley) & is.na(PoolQC) & is.na(Fence) & is.na(MiscVal) & is.na(FireplaceQu), missingness := "two"]
Discrete Features
Now, we will perform a more in depth examination of the data by looking at discrete features first. We’re interested in determining a few things. First, we want to see how many observations there are of each value for each discrete feature. Features that appear to sparse may need to be removed from the data. Second, we want to look at each level of each discrete feature and how it relates to log_SalePrice to determine which features provide strong signals for log_SalePrice. To address the first point, we will cerate bar plots for each discrete feature. To address the second point, we will look at boxplots for each of the same features.
# Count of distinct values in each discrete feature
disc_bar <- home_data[,c(disc_features, "split"), with = FALSE] %>%
melt(id.vars = "split", measure.vars = disc_features) %>%
ggplot(aes(x = value, fill = split)) +
geom_bar(na.rm = FALSE, position = "dodge") +
facet_wrap(~variable, scales = "free") +
theme_tufte() +
theme(axis.text.y = element_blank(),
axis.text.x = element_text(angle = 45)) +
labs("Discrete Feature Barplots")
# Boxplot of response variable for discrete features
disc_box <- home_data[split == "train",c(disc_features, y2, "split"), with = FALSE] %>%
melt(id.vars = c("split", y2), measure.vars = disc_features) %>%
ggplot(aes(x = value, y = log_SalePrice)) +
geom_boxplot() +
facet_wrap(~variable, scales = "free") +
theme_tufte() +
theme(axis.text.y = element_blank(),
axis.text.x = element_text(angle = 45)) +
labs("Discrete Feature Boxplots")
plot_grid(disc_bar, disc_box, nrow = 2)

The above plots illustrate some important insights. First, Street, Utlilities, and PoolQC each appear to be rather sparse. However, PoolQC does seem to have a strong influence on log_SalePrice. There are several additional variables that seem to strongly indicate logSalePrice. Sepcifically, OverallQual, OverallCond have rather distinct patterns. To determine of these two variable’s can be further exploited, we’ll look at how they interact with one another. The following heatmaps illustrate the interaction between OverallQual and OverallCond. The heatmap on the left illustrates how frequently various combinations of these features occur together while the heatmap on the right illustrates the
n_heat <- home_data[,.N, by = .(split, OverallQual, OverallCond)] %>%
ggplot(aes(x = OverallQual, y = OverallCond, fill = N)) +
geom_raster() +
theme_minimal() +
labs(title = "Heatmap of OverallCond and OverallQual")
lsp_heat <- home_data[split == "train",.(avg_log_SalePrice = mean(log_SalePrice)), by = .(OverallCond, OverallQual)] %>%
ggplot(aes(x = OverallQual, y = OverallCond, fill = avg_log_SalePrice)) +
geom_raster() +
theme_minimal() +
labs(title = "Heatmap of OverallCond and OverallQual and log_SalePrice")
plot_grid(n_heat, lsp_heat)

As can be seen from the heatmaps, the majority of homes are sold with OverallQual between 4 and 8 and OverallCond between 5 and 7. This pattern remains consistent across test and train data. As expected, average log_SalePrice increases as both OveralQual and OverallCond increase.
Continuous Features
Now that we have at least some idea of what is happening with the discrete features, we’ll take a look at what is happening with the continuous features. First, we’ll look at a correlation plot of each continuous feature to determine the strength of relationship both among features and between featuers and log_SalePrice.
# Correlation matrix
house_corplot <- home_data[split == "train",c(cont_features, y2), with = FALSE] %>%
correlate %>%
rearrange() %>%
shave() %>%
rplot()
house_corplot +
theme(axis.text.x = element_text(angle = 90, hjust = 1))


The correlation plot indicates that several features dealing with the size of the home are strongly correlated with log_SalePrice. Many of these features are also correlated with one another. While this would be an issue with general linear models, the statistical methods we will use should handle collinearity well.

# Scatter plot pairs
home_data[split == "train", c(cont_features, y), with = FALSE] %>%
ggpairs()
Quick Initial Model
# Move data to h2o (training only needed now)
fwrite(home_data[split == "train"], "../data/train_data_eng.csv")
train_h <- h2o.importFile("../data/train_data_eng.csv", destination_frame = "train_h")
home_df_rf <- h2o.randomForest(
x = c(cont_features, disc_features),
y = y,
training_frame = train_h,
model_id = "home_df_rf"
)
home_df_rf
h2o.varimp(home_df_rf) %>%
as.data.table %>%
.[,.(variable, scaled_importance = round(scaled_importance, 2), percentage = round(percentage, 2))]
Random Explorings (DO NOT INCLUDE IN FINAL REPORT)
# Have home prices changed over time?
home_data %>%
ggplot(aes(x = SalePrice)) +
geom_histogram() +
facet_grid(YrSold ~ MoSold) +
theme_tufte()
home_data %>%
ggplot(aes(x = as.factor(YrSold), y = SalePrice)) +
geom_boxplot() +
theme_tufte()
# Slight downward trend in home prices but nothing crazy
# Time series plot
home_data[,.(Avg_Sale_Price = mean(SalePrice, na.rm = TRUE)), by = .(YrSold, MoSold)][order(YrSold, MoSold),Avg_Sale_Price] %>%
as.ts(frequency = 12) %>%
plot
home_data[,table(YrSold, MoSold, split)]
LS0tCnRpdGxlOiAiSW50cm8gdG8gRGF0YSBWaXMgSFcgMiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3Igc2V0dXB9CmtuaXRyOjpvcHRzX2NodW5rJHNldChmaWcud2lkdGggPSAxMCwgZmlnLmhlaWdodCA9IDgpCgpyZXF1aXJlKGRhdGEudGFibGUpCnJlcXVpcmUobWFncml0dHIpCnJlcXVpcmUodGlkeXZlcnNlKQpyZXF1aXJlKHN0cmluZ3IpCnJlcXVpcmUoZ2d0aGVtZXMpCnJlcXVpcmUoZ2d0aGVtcikKcmVxdWlyZShHR2FsbHkpCnJlcXVpcmUoY293cGxvdCkKcmVxdWlyZShjb3JycikKcmVxdWlyZShWSU0pCnJlcXVpcmUoaDJvKQpyZXF1aXJlKGgyb0Vuc2VtYmxlKQoKIyBJbml0aWFsaXplIGgybwpoMm8uaW5pdCgpCmBgYAoKYGBge3IgbG9hZCBkYXRhfQpob21lX3RyYWluIDwtIGZyZWFkKCIuLi9kYXRhL3RyYWluLmNzdiIpCmhvbWVfdGVzdCA8LSBmcmVhZCgiLi4vZGF0YS90ZXN0LmNzdiIpICAjIE1pc3NpbmcgcmVzcG9uc2UgLSB1c2VkIGZvciBLYWdnbGUgc3VibWlzc2lvbgoKbmFtZXMoaG9tZV90cmFpbikKYGBgCgoKIyMgSW50cm9kdWN0aW9uClRoaXMgZGF0YSBzZXQgY29tZXMgZnJvbSBbdGhpc10oaHR0cHM6Ly93d3cua2FnZ2xlLmNvbS9jL2hvdXNlLXByaWNlcy1hZHZhbmNlZC1yZWdyZXNzaW9uLXRlY2huaXF1ZXMpIGthZ2dsZSBjb21wZXRpdGlvbi4gVGhlIGRhdGEgY29udGFpbnMgZGF0YSBvbiBob21lIHNhbGVzIGluIEFtZXMsIElvd2EsIGZyb20gMjAwNiAtIDIwMTAuIFRoZXJlIGFyZSBgciBuY29sKGhvbWVfdHJhaW4pYCBmZWF0dXJlcyBpbiB0aGUgZGF0YSwgaW5jbHVkaW5nIHRoZSBwcmljZSBvZiB0aGUgaG9tZSB3aGVuIGl0IHNvbGQgKGBTYWxlUHJpY2VgKS4gVGhlIG9iamVjdCBvZiB0aGUgS2FnZ2xlIGNvbXBldGl0aW9uIGlzIHRvIGFjY3VyYXRlbHkgcHJlZGljdCBgU2FsZVByaWNlYCBmb3IgYSBwcmVkZWZpbmVkIHRlc3Qgc2V0IG9mIGhvbWVzLiBXZSB3aWxsIGV4cGxvcmUgdGhlIGRhdGEgdG8gZ2FpbiB1bmRlcnN0YW5kaW5nIGFuZCBsb29rIHRvIGV4cG9zZSBhbnkgdW5kZXJseWluZyBwYXR0ZXJucyBiZWZvcmUgbW9kZWxpbmcgdGhlIGRhdGEgYW5kIGF0dGVtcHRpbmcgdG8gY2xpbWIgaW50byB0aGUgdG9wIDEwJSBvZiB0aGUgS2FnZ2xlIGxlYWRlcmJvYXJkLgoKIyMgRXhwbG9yYXRvcnkgRGF0YSBBbmFseXNpcwpCZWZvcmUganVtcGluZyBpbnRvIGxvb2tpbmcgYXQgdGhlIGRhdGEsIGEgZmV3IHNpbXBsZSBwcmUgcHJvY2Vzc2luZyBzdGVwcyBhcmUgbmVjZXNzYXJ5LiBTb21lIG9mIHRoZSBudW1lcmljIGZlYXR1cmVzIGluIHRoZSBkYXRhLCBsaWtlIHF1YWxpdHkgc2NvcmVzLCBhcmUgYWN0dWFsbHkgY2F0ZWdvcmljYWwgaW4gbmF0dXJlLiBXZSdsbCB0cmFuc2Zvcm0gdGhvc2UgZmVhdHVyZXMgaW50byBmYWN0b3JzIGFuZCB3ZSdsbCBhbHNvIGlkZW50aWZ5IGFuZCBzZXBhcmF0ZSBkaXNjcmV0ZSBmZWF0dXJlcyBmcm9tIGNvbnRpbnVvdXMgZmVhdHVyZXMgc28gdGhleSBjYW4gYmUgZXhhbWluZWQgc2VwYXJhdGVseS4gV2UnbGwgYWxzbyBjb21iaW5lIHRoZSBkYXRhIHNvIHRoYXQgd2UgY2FuIG1ha2UgZWFzaWx5IG1ha2UgY29tcGFyaXNvbnMgYmV0d2VlbiB0aGUgdGVzdCBhbmQgdGhlIHRyYWluIHNldCB0byBjaGVjayB0aGF0IGZlYXR1cmUgZGlzdHJpYnV0aW9ucyBhcmUgYXBwcm94aW1hdGVseSBpZGVudGljYWwuCgpgYGB7ciBwcmUgcHJvY2Vzc2luZ30KIyBDb21iaW5lIGRhdGEgc2V0cyB3aXRoIGRpc3RpbmN0aW9uIGJldHdlZW4gdHJhaW4gYW5kIHRlc3QKaG9tZV90cmFpblssc3BsaXQgOj0gInRyYWluIl0KaG9tZV90ZXN0WyxzcGxpdCA6PSAidGVzdCJdCmhvbWVfdGVzdFssU2FsZVByaWNlIDo9IE5BXQoKaG9tZV9kYXRhIDwtIHJiaW5kKGhvbWVfdHJhaW4sIGhvbWVfdGVzdCkKCiMgQ29udmVydCBmZWF0dXJlcyBpbnRvIHByb3BlciBmb3JtYXQgYmFzZWQgb24gZGF0YV9kZXNjcmlwdGlvbi50eHQKaG9tZV9kYXRhWyxjKCJNU1N1YkNsYXNzIiwKICAgICAgICAgICAgICJPdmVyYWxsUXVhbCIsCiAgICAgICAgICAgICAiT3ZlcmFsbENvbmQiKSA6PSBsaXN0KAogICAgICAgICAgICAgICAgYXMuZmFjdG9yKE1TU3ViQ2xhc3MpLAogICAgICAgICAgICAgICAgYXMuZmFjdG9yKE92ZXJhbGxRdWFsKSwKICAgICAgICAgICAgICAgIGFzLmZhY3RvcihPdmVyYWxsQ29uZCkKICAgICAgICAgICAgICApXQoKIyBOdW1iZXIgb2YgZGlzY3JldGUgYW5kIGNvbnRpbnVvdXMgZmVhdHVyZXMKc2FwcGx5KGhvbWVfZGF0YSwgY2xhc3MpICU+JSAKICB0YWJsZQoKIyBEZWZpbmUgZGVwZW5kZW50IHZhcmlhYmxlCnkgPC0gIlNhbGVQcmljZSIKCiMgU2V0bmFtZXMgdG8gYmUgbW9yZSBwcm9ncmFtIGZyaWVuZGx5CnNldG5hbWVzKGhvbWVfZGF0YSwgbmFtZXMoaG9tZV9kYXRhKVtzdHJfZGV0ZWN0KG5hbWVzKGhvbWVfZGF0YSksICJeWzAtOV0iKV0sIHBhc3RlMCgiaCIsIG5hbWVzKGhvbWVfZGF0YSlbc3RyX2RldGVjdChuYW1lcyhob21lX2RhdGEpLCAiXlswLTldIildKSkKCiMgRXh0cmFjdCBuYW1lcyBvZiBjb250aW51b3VzIGZlYXR1cmVzCmNvbnRfZmVhdHVyZXMgPC0gbmFtZXMoaG9tZV9kYXRhKVtzYXBwbHkoaG9tZV9kYXRhLCBjbGFzcykgPT0gImludGVnZXIiXQpjb250X2ZlYXR1cmVzIDwtIHNldGRpZmYoY29udF9mZWF0dXJlcywgYyh5LCAiSWQiKSkKCiMgRXh0cmFjdCBuYW1lcyBvZiBkaXNjcmV0ZSBmZWF0dXJlcwpkaXNjX2ZlYXR1cmVzIDwtIG5hbWVzKGhvbWVfZGF0YSlbc2FwcGx5KGhvbWVfZGF0YSwgY2xhc3MpICVpbiUgYygiY2hhcmFjdGVyIiwgImZhY3RvciIpXQpkaXNjX2ZlYXR1cmVzIDwtIHNldGRpZmYoZGlzY19mZWF0dXJlcywgInNwbGl0IikKYGBgCgojIyMgUmVzcG9uc2UgVmFyaWFibGUKQmVmb3JlIGp1bXBpbmcgaW50byBleGFtaW5pbmcgdGhlIGZlYXR1cmVzLCBpdCBpcyBpbXBvcnRhbnQgdG8gdW5kZXJzdGFuZCB0aGUgcmVzcG9uc2UgdmFyaWFibGUsIGluIHRoaXMgY2FzZSBgU2FsZVByaWNlYC4gVGhlIGZvbGxvd2luZyBoaXN0b2dyYW0gaWxsdXN0cmF0ZXMgdGhlIGRpc3RyaWJ1dGlvbiBvZiBgU2FsZVByaWNlYCBpbiB0aGUgZGF0YS4KCmBgYHtyIFNhbGVQcmljZSBlZGF9CnNwX2hpc3QgPC0gaG9tZV9kYXRhICU+JSAKICBnZ3Bsb3QoYWVzKHggPSBTYWxlUHJpY2UpKSArCiAgZ2VvbV9oaXN0b2dyYW0oKSArCiAgdGhlbWVfbWluaW1hbCgpICsKICBsYWJzKHRpdGxlID0gIkhpc3RvZ3JhbSBvZiBTYWxlIFByaWNlIikKCnNwX2hpc3QKYGBgCgpUaGlzIGhpc3RvZ3JhbSBjbGVhcmx5IGluZGljYXRlcyB0aGUgZGF0YSBpcyBzb21ld2hhdCByaWdodCBza2V3ZWQuIEluIGFuIGF0dGVtcHQgdG8gY29udHJvbCBmb3IgdGhhdCwgd2UnbGwgbG9vayBhdCB3aGF0IHRoZSBkaXN0cmlidXRpb24gbG9va3MgbGlrZSBhZnRlciBhIGxvZyB0cmFuc2Zvcm1hdGlvbiBhbmQgY29tcGFyZSBpdCB3aXRoIHRoZSBvcmlnaW5hbCBkaXN0cmlidXRpb24gaW4gdGhlIGZvbGxvd2luZyBwbG90LgoKYGBge3IgbG9nIHNwfQpob21lX2RhdGFbLGxvZ19TYWxlUHJpY2UgOj0gbG9nKFNhbGVQcmljZSldCgojIEhpc3RvZ3JhbSBvZiBsb2cgc2FsZSBwcmljZQpsc3BfaGlzdCA8LSBob21lX2RhdGEgJT4lIAogIGdncGxvdChhZXMoeCA9IGxvZ19TYWxlUHJpY2UpKSArCiAgZ2VvbV9oaXN0b2dyYW0oKSArCiAgdGhlbWVfbWluaW1hbCgpICsKICBsYWJzKHRpdGxlID0gIkhpc3RvZ3JhbSBvZiBsb2coU2FsZVByaWNlKSIpCgojIFNpZGUgYnkgc2lkZSBjb21wYXJpc29uIG9mIG9yaWdpb25hbCBkaXN0cmlidXRpb24gd2l0aCBsb2cgZGlzdHJpYnV0aW9uCnBsb3RfZ3JpZChzcF9oaXN0LCBsc3BfaGlzdCkKCiMgRGVmaW5lIHkyIGFzIGxvZ19TYWxlUHJpY2UKeTIgPC0gImxvZ19TYWxlUHJpY2UiCmBgYAoKVGhlIGxvZyB0cmFuc2Zvcm1hdGlvbiBhcHBlYXJlZCB0byBoZWxwIHdpdGggdGhlIHNrZXduZXNzIG9mIHRoZSBkYXRhIHNvIHRoZSByZW1haW5kZXIgb2YgdGhlIGFuYWx5c2lzIHdpbGwgdXNlIGBsb2dfU2FsZVByaWNlYC4KCiMjIyBNaXNzaW5nbmVzcwpOb3csIHdlIHdhbnQgdG8gY2hlY2sgZm9yIG1pc3NpbmcgdmFsdWVzLiBUaGUgYFZJTWAgcGFja2FnZSBoYXMgc29tZSBuaWNlIGZ1bmN0aW9ucyBmb3IgdmlzdWFsaXppbmcgbWlzc2luZyB2YWx1ZXMgYW5kIHNvIHdlJ2xsIHVzZSB0aG9zZS4gRXNzZW50aWFsbHksIHdlIHdhbnQgdG8gbG9vayBmb3IgZmVhdHVyZXMgd2l0aCBhIGhpZ2ggdm9sdW1uIG9mIG1pc3NpbmcgdmFsdWVzIGFuZCB3ZSBhbHNvIHdhbnQgdG8gc2VlIGlmIHRoZXJlIGFyZSBwcmV2ZWxhbnQgcGF0dGVybnMgaW4gZmVhdHVyZXMgdGhhdCBhcmUgb2Z0ZW4gbWlzc2luZyB0b2dldGhlci4gVGhhdCBjb3VsZCBpbmRpY2F0ZSBhIGxhdGVudCB2YXJpYWJsZSB0aGF0IGlzbid0IGJlaW5nIG1lYXN1cmVkIGJ1dCB0aGF0IHdlIGNvdWxkIGFjY291bnQgZm9yIHdpdGggYSBkdW1teSB2YXJpYWJsZS4KCmBgYHtyIG1pc3NpbmduZXNzfQojIEV4cGxvcmUgbWlzc2luZ25lc3Mgb2YgZGF0YQphZ2dyKGhvbWVfZGF0YVssIC1jKHksIHkyKSwgd2l0aCA9IEZBTFNFXSwgY29tYmluZWQgPSBUUlVFKQpgYGAKClRoZXJlIGFyZSBxdWl0ZSBhIGZldyBjb2x1bW5zIGRvbWluYXRlZCBieSBtaXNzaW5nIHZhbHVlcy4gSW4gYWRkaXRpb24sIHRoZXJlIGFyZSBhIGNvdXBsZSBvZiBjb21iaW5hdGlvbnMgb2YgbWlzc2luZyB2YWx1ZXMgaW4gb2JzZXJ2YXRpb25zIHRoYXQgaGFwcGVuIHJhdGhlciBmcmVxdWVudGx5LCBwb3NzaWJseSBhbiBpbmRpY2F0aW9uIG9mIHNvbWUgbGF0ZW50IHZhcmlhYmxlLiBJbiBvcmRlciB0byBoZWxwIG1hY2hpbmUgbGVhcm5pbmcgbW9kZWxzIHBpY2sgdXAgb24gdGhlc2Ugc2ltaWxhciBvYnNlcnZhdGlvbnMsIHdlJ2xsIGFkZCBhIGBtaXNzaW5nbmVzc2AgZmVhdHVyZSB0byB0aGUgZGF0YSB0byBlbmNvZGUgY29tbW9uIG1pc3NpbmduZXNzIGNvbWJpbmF0aW9ucy4KCmBgYHtyfQojIEFkZCBtaXNzaW5nbmVzcyBmZWF0dXJlCmhvbWVfZGF0YVssbWlzc2luZ25lc3MgOj0gIm90aGVyIl0KaG9tZV9kYXRhW2lzLm5hKEFsbGV5KSAmIGlzLm5hKFBvb2xRQykgJiBpcy5uYShGZW5jZSkgJiBpcy5uYShNaXNjVmFsKSwgbWlzc2luZ25lc3MgOj0gIm9uZSJdCmhvbWVfZGF0YVtpcy5uYShBbGxleSkgJiBpcy5uYShQb29sUUMpICYgaXMubmEoRmVuY2UpICYgaXMubmEoTWlzY1ZhbCkgJiBpcy5uYShGaXJlcGxhY2VRdSksIG1pc3NpbmduZXNzIDo9ICJ0d28iXQpgYGAKCiMjIyBEaXNjcmV0ZSBGZWF0dXJlcwpOb3csIHdlIHdpbGwgcGVyZm9ybSBhIG1vcmUgaW4gZGVwdGggZXhhbWluYXRpb24gb2YgdGhlIGRhdGEgYnkgbG9va2luZyBhdCBkaXNjcmV0ZSBmZWF0dXJlcyBmaXJzdC4gV2UncmUgaW50ZXJlc3RlZCBpbiBkZXRlcm1pbmluZyBhIGZldyB0aGluZ3MuIEZpcnN0LCB3ZSB3YW50IHRvIHNlZSBob3cgbWFueSBvYnNlcnZhdGlvbnMgdGhlcmUgYXJlIG9mIGVhY2ggdmFsdWUgZm9yIGVhY2ggZGlzY3JldGUgZmVhdHVyZS4gRmVhdHVyZXMgdGhhdCBhcHBlYXIgdG8gc3BhcnNlIG1heSBuZWVkIHRvIGJlIHJlbW92ZWQgZnJvbSB0aGUgZGF0YS4gU2Vjb25kLCB3ZSB3YW50IHRvIGxvb2sgYXQgZWFjaCBsZXZlbCBvZiBlYWNoIGRpc2NyZXRlIGZlYXR1cmUgYW5kIGhvdyBpdCByZWxhdGVzIHRvIGBsb2dfU2FsZVByaWNlYCB0byBkZXRlcm1pbmUgd2hpY2ggZmVhdHVyZXMgcHJvdmlkZSBzdHJvbmcgc2lnbmFscyBmb3IgYGxvZ19TYWxlUHJpY2VgLiBUbyBhZGRyZXNzIHRoZSBmaXJzdCBwb2ludCwgd2Ugd2lsbCBjZXJhdGUgYmFyIHBsb3RzIGZvciBlYWNoIGRpc2NyZXRlIGZlYXR1cmUuIFRvIGFkZHJlc3MgdGhlIHNlY29uZCBwb2ludCwgd2Ugd2lsbCBsb29rIGF0IGJveHBsb3RzIGZvciBlYWNoIG9mIHRoZSBzYW1lIGZlYXR1cmVzLgoKYGBge3IgZGlzY3JldGUgZmVhdHVyZXN9CiMgQ291bnQgb2YgZGlzdGluY3QgdmFsdWVzIGluIGVhY2ggZGlzY3JldGUgZmVhdHVyZQpkaXNjX2JhciA8LSBob21lX2RhdGFbLGMoZGlzY19mZWF0dXJlcywgInNwbGl0IiksIHdpdGggPSBGQUxTRV0gJT4lIAogIG1lbHQoaWQudmFycyA9ICJzcGxpdCIsIG1lYXN1cmUudmFycyA9IGRpc2NfZmVhdHVyZXMpICU+JSAKICBnZ3Bsb3QoYWVzKHggPSB2YWx1ZSwgZmlsbCA9IHNwbGl0KSkgKwogIGdlb21fYmFyKG5hLnJtID0gRkFMU0UsIHBvc2l0aW9uID0gImRvZGdlIikgKwogIGZhY2V0X3dyYXAofnZhcmlhYmxlLCBzY2FsZXMgPSAiZnJlZSIpICsKICB0aGVtZV90dWZ0ZSgpICsKICB0aGVtZShheGlzLnRleHQueSA9IGVsZW1lbnRfYmxhbmsoKSwKICAgICAgICBheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDQ1KSkgKwogIGxhYnMoIkRpc2NyZXRlIEZlYXR1cmUgQmFycGxvdHMiKQoKIyBCb3hwbG90IG9mIHJlc3BvbnNlIHZhcmlhYmxlIGZvciBkaXNjcmV0ZSBmZWF0dXJlcwpkaXNjX2JveCA8LSBob21lX2RhdGFbc3BsaXQgPT0gInRyYWluIixjKGRpc2NfZmVhdHVyZXMsIHkyLCAic3BsaXQiKSwgd2l0aCA9IEZBTFNFXSAlPiUgCiAgbWVsdChpZC52YXJzID0gYygic3BsaXQiLCB5MiksIG1lYXN1cmUudmFycyA9IGRpc2NfZmVhdHVyZXMpICU+JSAKICBnZ3Bsb3QoYWVzKHggPSB2YWx1ZSwgeSA9IGxvZ19TYWxlUHJpY2UpKSArCiAgZ2VvbV9ib3hwbG90KCkgKwogIGZhY2V0X3dyYXAofnZhcmlhYmxlLCBzY2FsZXMgPSAiZnJlZSIpICsKICB0aGVtZV90dWZ0ZSgpICsKICB0aGVtZShheGlzLnRleHQueSA9IGVsZW1lbnRfYmxhbmsoKSwKICAgICAgICBheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDQ1KSkgKwogIGxhYnMoIkRpc2NyZXRlIEZlYXR1cmUgQm94cGxvdHMiKQoKcGxvdF9ncmlkKGRpc2NfYmFyLCBkaXNjX2JveCwgbnJvdyA9IDIpCmBgYAoKVGhlIGFib3ZlIHBsb3RzIGlsbHVzdHJhdGUgc29tZSBpbXBvcnRhbnQgaW5zaWdodHMuIEZpcnN0LCBgU3RyZWV0YCwgYFV0bGlsaXRpZXNgLCBhbmQgYFBvb2xRQ2AgZWFjaCBhcHBlYXIgdG8gYmUgcmF0aGVyIHNwYXJzZS4gSG93ZXZlciwgYFBvb2xRQ2AgZG9lcyBzZWVtIHRvIGhhdmUgYSBzdHJvbmcgaW5mbHVlbmNlIG9uIGBsb2dfU2FsZVByaWNlYC4gVGhlcmUgYXJlIHNldmVyYWwgYWRkaXRpb25hbCB2YXJpYWJsZXMgdGhhdCBzZWVtIHRvIHN0cm9uZ2x5IGluZGljYXRlIGBsb2dTYWxlUHJpY2VgLiBTZXBjaWZpY2FsbHksIGBPdmVyYWxsUXVhbGAsIGBPdmVyYWxsQ29uZGAgaGF2ZSByYXRoZXIgZGlzdGluY3QgcGF0dGVybnMuIFRvIGRldGVybWluZSBvZiB0aGVzZSB0d28gdmFyaWFibGUncyBjYW4gYmUgZnVydGhlciBleHBsb2l0ZWQsIHdlJ2xsIGxvb2sgYXQgaG93IHRoZXkgaW50ZXJhY3Qgd2l0aCBvbmUgYW5vdGhlci4gVGhlIGZvbGxvd2luZyBoZWF0bWFwcyBpbGx1c3RyYXRlIHRoZSBpbnRlcmFjdGlvbiBiZXR3ZWVuIGBPdmVyYWxsUXVhbGAgYW5kIGBPdmVyYWxsQ29uZGAuIFRoZSBoZWF0bWFwIG9uIHRoZSBsZWZ0IGlsbHVzdHJhdGVzIGhvdyBmcmVxdWVudGx5IHZhcmlvdXMgY29tYmluYXRpb25zIG9mIHRoZXNlIGZlYXR1cmVzIG9jY3VyIHRvZ2V0aGVyIHdoaWxlIHRoZSBoZWF0bWFwIG9uIHRoZSByaWdodCBpbGx1c3RyYXRlcyB0aGUgCgpgYGB7ciBxdWFsIGludGVyYWN0aW9uc30Kbl9oZWF0IDwtIGhvbWVfZGF0YVssLk4sIGJ5ID0gLihzcGxpdCwgT3ZlcmFsbFF1YWwsIE92ZXJhbGxDb25kKV0gJT4lIAogIGdncGxvdChhZXMoeCA9IE92ZXJhbGxRdWFsLCB5ID0gT3ZlcmFsbENvbmQsIGZpbGwgPSBOKSkgKwogIGdlb21fcmFzdGVyKCkgKwogIHRoZW1lX21pbmltYWwoKSArCiAgbGFicyh0aXRsZSA9ICJIZWF0bWFwIG9mIE92ZXJhbGxDb25kIGFuZCBPdmVyYWxsUXVhbCIpCgpsc3BfaGVhdCA8LSBob21lX2RhdGFbc3BsaXQgPT0gInRyYWluIiwuKGF2Z19sb2dfU2FsZVByaWNlID0gbWVhbihsb2dfU2FsZVByaWNlKSksIGJ5ID0gLihPdmVyYWxsQ29uZCwgT3ZlcmFsbFF1YWwpXSAlPiUgCiAgZ2dwbG90KGFlcyh4ID0gT3ZlcmFsbFF1YWwsIHkgPSBPdmVyYWxsQ29uZCwgZmlsbCA9IGF2Z19sb2dfU2FsZVByaWNlKSkgKwogIGdlb21fcmFzdGVyKCkgKwogIHRoZW1lX21pbmltYWwoKSArCiAgbGFicyh0aXRsZSA9ICJIZWF0bWFwIG9mIE92ZXJhbGxDb25kIGFuZCBPdmVyYWxsUXVhbCBhbmQgbG9nX1NhbGVQcmljZSIpCgpwbG90X2dyaWQobl9oZWF0LCBsc3BfaGVhdCkKYGBgCgpBcyBjYW4gYmUgc2VlbiBmcm9tIHRoZSBoZWF0bWFwcywgdGhlIG1ham9yaXR5IG9mIGhvbWVzIGFyZSBzb2xkIHdpdGggYE92ZXJhbGxRdWFsYCBiZXR3ZWVuIDQgYW5kIDggYW5kIGBPdmVyYWxsQ29uZGAgYmV0d2VlbiA1IGFuZCA3LiBUaGlzIHBhdHRlcm4gcmVtYWlucyBjb25zaXN0ZW50IGFjcm9zcyB0ZXN0IGFuZCB0cmFpbiBkYXRhLiBBcyBleHBlY3RlZCwgYXZlcmFnZSBgbG9nX1NhbGVQcmljZWAgaW5jcmVhc2VzIGFzIGJvdGggYE92ZXJhbFF1YWxgIGFuZCBgT3ZlcmFsbENvbmRgIGluY3JlYXNlLgoKIyMjIENvbnRpbnVvdXMgRmVhdHVyZXMKTm93IHRoYXQgd2UgaGF2ZSBhdCBsZWFzdCBzb21lIGlkZWEgb2Ygd2hhdCBpcyBoYXBwZW5pbmcgd2l0aCB0aGUgZGlzY3JldGUgZmVhdHVyZXMsIHdlJ2xsIHRha2UgYSBsb29rIGF0IHdoYXQgaXMgaGFwcGVuaW5nIHdpdGggdGhlIGNvbnRpbnVvdXMgZmVhdHVyZXMuIEZpcnN0LCB3ZSdsbCBsb29rIGF0IGEgY29ycmVsYXRpb24gcGxvdCBvZiBlYWNoIGNvbnRpbnVvdXMgZmVhdHVyZSB0byBkZXRlcm1pbmUgdGhlIHN0cmVuZ3RoIG9mIHJlbGF0aW9uc2hpcCBib3RoIGFtb25nIGZlYXR1cmVzIGFuZCBiZXR3ZWVuIGZlYXR1ZXJzIGFuZCBgbG9nX1NhbGVQcmljZWAuIAoKYGBge3IgY29udGludW91cyBmZWF0dXJlc30KIyBDb3JyZWxhdGlvbiBtYXRyaXgKaG91c2VfY29ycGxvdCA8LSBob21lX2RhdGFbc3BsaXQgPT0gInRyYWluIixjKGNvbnRfZmVhdHVyZXMsIHkyKSwgd2l0aCA9IEZBTFNFXSAlPiUgCiAgY29ycmVsYXRlICU+JSAKICByZWFycmFuZ2UoKSAlPiUgCiAgc2hhdmUoKSAlPiUgCiAgcnBsb3QoKQogIApob3VzZV9jb3JwbG90ICsKICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDkwLCBoanVzdCA9IDEpKQpgYGAKCgpgYGB7cn0KIyBTY2F0dGVyIHBsb3QgcGFpcnMKaG9tZV9kYXRhW3NwbGl0ID09ICJ0cmFpbiIsIGMoY29udF9mZWF0dXJlcywgeSksIHdpdGggPSBGQUxTRV0gJT4lIAogIGNvcgoKaG9tZV9kYXRhJFllYXJCdWlsdCAlPiUgc3VtbWFyeQpob21lX2RhdGFbLGRlY2FkZV9idWlsdCA6PSBjdXQoWWVhckJ1aWx0LCBzZXEoMTg2OSwgMjAxOSwgYnkgPSAxMCksIGxhYmVscyA9IHBhc3RlMChzZXEoMTg3MCwgMjAxMCwgYnkgPSAxMCksICJzIiksIGluY2x1ZGUubG93ZXN0ID0gVFJVRSldCmhvbWVfZGF0YVssLk4sIGJ5ID0gZGVjYWRlX2J1aWx0XQpob21lX2RhdGFbLGxlbmd0aCh1bmlxdWUoZGVjYWRlX2J1aWx0KSldCmxlbmd0aChzZXEoMTg3MCwgMjAxMCwgYnkgPSAxMCkpCmhvbWVfZGF0YVssc3VtbWFyeShZZWFyQnVpbHQpLCBieSA9IGRlY2FkZV9idWlsdF0KCgpob21lX2RhdGFbc3BsaXQgPT0gInRyYWluIiwuKGF2Z19sb2dfU2FsZVByaWNlID0gbWVhbihsb2dfU2FsZVByaWNlKSksIGJ5ID0gLihkZWNhZGVfYnVpbHQsIE5laWdoYm9yaG9vZCldICU+JSAKICBnZ3Bsb3QoYWVzKHggPSBkZWNhZGVfYnVpbHQsIHkgPSBhdmdfbG9nX1NhbGVQcmljZSwgY29sID0gTmVpZ2hib3Job29kLCBncm91cCA9IE5laWdoYm9yaG9vZCkpICsKICBnZW9tX2xpbmUoKSArCiAgdGhlbWVfbWluaW1hbCgpCgpob21lX2RhdGFbLHRhYmxlKE5laWdoYm9yaG9vZCwgZGVjYWRlX2J1aWx0KV0KCmhvbWVfZGF0YVssLk4sIGJ5ID0gLihOZWlnaGJvcmhvb2QsIGRlY2FkZV9idWlsdCldICU+JSAKICBnZ3Bsb3QoYWVzKHggPSBkZWNhZGVfYnVpbHQsIHkgPSBOZWlnaGJvcmhvb2QsIGZpbGwgPSBOKSkgKwogIGdlb21fcmFzdGVyKCkgKwogIHRoZW1lX21pbmltYWwoKQpgYGAKCgpUaGUgY29ycmVsYXRpb24gcGxvdCBpbmRpY2F0ZXMgdGhhdCBzZXZlcmFsIGZlYXR1cmVzIGRlYWxpbmcgd2l0aCB0aGUgc2l6ZSBvZiB0aGUgaG9tZSBhcmUgc3Ryb25nbHkgY29ycmVsYXRlZCB3aXRoIGBsb2dfU2FsZVByaWNlYC4gTWFueSBvZiB0aGVzZSBmZWF0dXJlcyBhcmUgYWxzbyBjb3JyZWxhdGVkIHdpdGggb25lIGFub3RoZXIuIFdoaWxlIHRoaXMgd291bGQgYmUgYW4gaXNzdWUgd2l0aCBnZW5lcmFsIGxpbmVhciBtb2RlbHMsIHRoZSBzdGF0aXN0aWNhbCBtZXRob2RzIHdlIHdpbGwgdXNlIHNob3VsZCBoYW5kbGUgY29sbGluZWFyaXR5IHdlbGwuCgpgYGB7cn0KdHJlZW1hcDo6dHJlZW1hcChob21lX2RhdGFbc3BsaXQgPT0gInRyYWluIiwgLihzaXplX3ZhciA9IC5OKSwgYnkgPSAuKGRlY2FkZV9idWlsdCwgTmVpZ2hib3Job29kKV0sICNZb3VyIGRhdGEgZnJhbWUgb2JqZWN0CiAgICAgICAgaW5kZXg9YygiTmVpZ2hib3Job29kIiwgImRlY2FkZV9idWlsdCIpLCAgI0EgbGlzdCBvZiB5b3VyIGNhdGVnb3JpY2FsIHZhcmlhYmxlcwogICAgICAgIHZTaXplID0gInNpemVfdmFyIiwgICNUaGlzIGlzIHlvdXIgcXVhbnRpdGF0aXZlIHZhcmlhYmxlCiAgICAgICAgdHlwZT0iaW5kZXgiLCAjVHlwZSBzZXRzIHRoZSBvcmdhbml6YXRpb24gYW5kIGNvbG9yIHNjaGVtZSBvZiB5b3VyIHRyZWVtYXAKICAgICAgICBwYWxldHRlID0gIlNldDMiLCAgI1NlbGVjdCB5b3VyIGNvbG9yIHBhbGV0dGUgZnJvbSB0aGUgUkNvbG9yQnJld2VyIHByZXNldHMgb3IgbWFrZSB5b3VyIG93bi4KICAgICAgICB0aXRsZT0iU2FsZSBQcmljZSBieSBOZWlnaGJvcmhvb2QgYW5kIERlY2FkZSBCdWlsdCIsICNDdXN0b21pemUgeW91ciB0aXRsZQogICAgICAgIGZvbnRzaXplLnRpdGxlID0gMTQsICNDaGFuZ2UgdGhlIGZvbnQgc2l6ZSBvZiB0aGUgdGl0bGUKICAgICAgICBwb3NpdGlvbi5sZWdlbmQgPSAnbm9uZScKICAgICAgICApCmBgYAoKCmBgYHtyfQojIFNjYXR0ZXIgcGxvdCBwYWlycwpob21lX2RhdGFbc3BsaXQgPT0gInRyYWluIiwgYyhjb250X2ZlYXR1cmVzLCB5KSwgd2l0aCA9IEZBTFNFXSAlPiUgCiAgZ2dwYWlycygpCmBgYAoKCgojIyMgUXVpY2sgSW5pdGlhbCBNb2RlbApgYGB7cn0KIyBNb3ZlIGRhdGEgdG8gaDJvICh0cmFpbmluZyBvbmx5IG5lZWRlZCBub3cpCmZ3cml0ZShob21lX2RhdGFbc3BsaXQgPT0gInRyYWluIl0sICIuLi9kYXRhL3RyYWluX2RhdGFfZW5nLmNzdiIpCnRyYWluX2ggPC0gaDJvLmltcG9ydEZpbGUoIi4uL2RhdGEvdHJhaW5fZGF0YV9lbmcuY3N2IiwgZGVzdGluYXRpb25fZnJhbWUgPSAidHJhaW5faCIpCgpob21lX2RmX3JmIDwtIGgyby5yYW5kb21Gb3Jlc3QoCiAgeCA9IGMoY29udF9mZWF0dXJlcywgZGlzY19mZWF0dXJlcyksCiAgeSA9IHksCiAgdHJhaW5pbmdfZnJhbWUgPSB0cmFpbl9oLAogIG1vZGVsX2lkID0gImhvbWVfZGZfcmYiCikKCmhvbWVfZGZfcmYKCmgyby52YXJpbXAoaG9tZV9kZl9yZikgJT4lIAogIGFzLmRhdGEudGFibGUgJT4lIAogIC5bLC4odmFyaWFibGUsIHNjYWxlZF9pbXBvcnRhbmNlID0gcm91bmQoc2NhbGVkX2ltcG9ydGFuY2UsIDIpLCBwZXJjZW50YWdlID0gcm91bmQocGVyY2VudGFnZSwgMikpXQpgYGAKCiMjIyBSYW5kb20gRXhwbG9yaW5ncyAoRE8gTk9UIElOQ0xVREUgSU4gRklOQUwgUkVQT1JUKQpgYGB7cn0KIyBIYXZlIGhvbWUgcHJpY2VzIGNoYW5nZWQgb3ZlciB0aW1lPwpob21lX2RhdGEgJT4lIAogIGdncGxvdChhZXMoeCA9IFNhbGVQcmljZSkpICsKICBnZW9tX2hpc3RvZ3JhbSgpICsKICBmYWNldF9ncmlkKFlyU29sZCB+IE1vU29sZCkgKwogIHRoZW1lX3R1ZnRlKCkKCmhvbWVfZGF0YSAlPiUgCiAgZ2dwbG90KGFlcyh4ID0gYXMuZmFjdG9yKFlyU29sZCksIHkgPSBTYWxlUHJpY2UpKSArCiAgZ2VvbV9ib3hwbG90KCkgKwogIHRoZW1lX3R1ZnRlKCkKCiMgU2xpZ2h0IGRvd253YXJkIHRyZW5kIGluIGhvbWUgcHJpY2VzIGJ1dCBub3RoaW5nIGNyYXp5CiMgVGltZSBzZXJpZXMgcGxvdApob21lX2RhdGFbLC4oQXZnX1NhbGVfUHJpY2UgPSBtZWFuKFNhbGVQcmljZSwgbmEucm0gPSBUUlVFKSksIGJ5ID0gLihZclNvbGQsIE1vU29sZCldW29yZGVyKFlyU29sZCwgTW9Tb2xkKSxBdmdfU2FsZV9QcmljZV0gJT4lIAogIGFzLnRzKGZyZXF1ZW5jeSA9IDEyKSAlPiUgCiAgcGxvdAoKaG9tZV9kYXRhWyx0YWJsZShZclNvbGQsIE1vU29sZCwgc3BsaXQpXQpgYGAKCgojIyBGZWF0dXJlIEVuZ2luZWVyaW5nCmBgYHtyfQojIEhvdXNlIGFnZSBhdCB0aW1lIG9mIHNhbGUgKFlyU29sZCAtIFllYXJCdWlsdCkKYGBgCgoKIyMgRGF0YSBNb2RlbGluZwpMaW1pdGVkIG51bWJlciBvZiBvYnNlcnZhdGlvbnMuIENyb3NzIHZhbGlkYXRpb24gd2lsbCBiZSB1c2VkIHRvIG1heGltaXplIHRyYWluaW5nIHBvdGVudGlhbC4KYGBge3IgaDJvIHNldHVwfQoKYGBgCgojIyBLYWdnbGUgU3VibWlzc2lvbgoK